Discretization and Grouping: Preprocessing Steps for Data Mining
نویسندگان
چکیده
Unlike on-line discretization performed by a number of machine learning (ML) algorithms for building decision trees or decision rules, we propose off-line algorithms for discretizing numerical attributes and grouping values of nominal attributes. The number of resulting intervals obtained by discretization depends only on the data; the number of groups corresponds to the number of classes. Since both discretization and grouping is done with respect to the goal classes, the algorithms are suitable only for classification/prediction tasks. As a side effect of the off-line processing, the number of objects in the datasets and number of attributes may be reduced. It should be also mentioned that although the original idea of the discretization procedure is proposed to the Kex system, the algorithms show good performance together with other machine learning algorithms.
منابع مشابه
An Evolutionary Multi-objective Discretization based on Normalized Cut
Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...
متن کاملImplementation of Preprocessing Techniques in Datamining
carefully screened can produce misleading results. Thus, the raw data needs to pre-process before doing data mining. And often-times, this step can take considerable amount of processing time. Usually, data from experiments are not suitable for doing data mining tasks. Because of the raw data may contain out-ofrange-values, impossible data combination or missing value etc. Analyzing data withou...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملDiscretization of Numerical Attributes Preprocessing for Machine Learning
Page 2 of 46 Abstract The area of Knowledge discovery and Data mining is growing rapidly. A large number of methods is employed to mine knowledge. Several of the methods rely of discrete data. However, most datasets used in real application have attributes with continuously values. To make the data mining techniques useful for such datasets, discretization is performed as a preprocessing step o...
متن کاملResult Comparison of Two Rough Set Based Discretization Algorithms
The area of knowledge discovery and data mining is growing rapidly. A large number of methods are employed to mine knowledge. Many of the methods rely of discrete data. However, most of the datasets used in real application have attributes with continuous values. To make the data mining techniques useful for such datasets, discretization is performed as a preprocessing step of the data mining. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998